Skip to content

Conversation

@LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Oct 27, 2025

Description

As the title suggests.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows pass.
  • Ensure the generated doc can be properly rendered.

Summary by CodeRabbit

  • Documentation
    • Added a two‑mode S3 compression guide describing object‑level and key‑prefix workflows and their URL formats.
    • Clarified explicit mode selection, per‑mode argument details, and updated inline examples and links.
    • Documented supplying multiple object URLs via an input file for object mode.
    • Enforced and documented a non‑empty common‑prefix requirement for key‑prefix mode and added navigation anchors.

@LinZhihao-723 LinZhihao-723 requested a review from a team as a code owner October 27, 2025 19:26
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 27, 2025

Walkthrough

Replaces single-mode S3 compression guidance with a two-mode sbin/compress-from-s3.sh: s3-object (single-object or inputs-from file; requires a non-empty common prefix) and s3-key-prefix (single key-prefix). Adds mode-specific arguments, URL formats, examples, validations and updated anchors/links.

Changes

Cohort / File(s) Summary
S3 Compression Documentation Update
docs/src/user-docs/guides-using-object-storage/clp-usage.md
Replaces single-mode guidance with two-mode sbin/compress-from-s3.sh usage (s3-object, s3-key-prefix); renames/expands args (url, object-key, inputs-from); adds inputs-from support (file of object URLs) for s3-object; documents object-key URL and key-prefix URL formats; requires non-empty common prefix for s3-object; updates examples, anchors and links.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant User as User/CI
    participant CLI as sbin/compress-from-s3.sh
    participant S3 as S3 (URL / Key-Prefix)
    participant Compressor as Compression pipeline

    Note over User,CLI #DDEBF7: User selects mode and supplies args
    User->>CLI: invoke with mode=`s3-object` or `s3-key-prefix`\nargs: `url` / `object-key` / `inputs-from`
    alt s3-object (single-object or inputs-from)
        CLI->>S3: resolve object URLs (validate non-empty common prefix)
        S3-->>CLI: object(s) metadata / URLs
    else s3-key-prefix
        CLI->>S3: list objects by single key-prefix
        S3-->>CLI: object list
    end
    CLI->>Compressor: stream selected objects to compression pipeline
    Compressor-->>User: output archive/result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify mode names (s3-object, s3-key-prefix) and renamed args (url, object-key, inputs-from) are consistent across the doc.
  • Check inputs-from examples use the new object-key URL format and cover single and multi-object cases.
  • Confirm documentation enforces non-empty common-prefix validation for s3-object and single-key-prefix constraint for s3-key-prefix.
  • Validate updated anchors/links point to correct sections.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "docs(clp-package): Rewrite S3 log compression guide to reflect new API and script features." is fully aligned with the changeset. The raw summary confirms that the changes involve rewriting the S3 log compression guide documentation to reflect new script features, including the replacement of compress.sh with compress-from-s3.sh supporting multiple modes, new inputs-from support, and updated examples. The title is clear, specific, and concise—it effectively communicates the primary change without vagueness or extraneous information. The use of conventional commit format (docs scope) further demonstrates good documentation practice.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a33db9 and aa94167.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-object-storage/clp-usage.md (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1122
File: components/core/src/clp/clp/CMakeLists.txt:175-195
Timestamp: 2025-07-23T09:54:45.185Z
Learning: In the CLP project, when reviewing CMakeLists.txt changes that introduce new compression library dependencies (BZip2, LibLZMA, LZ4, ZLIB), the team prefers to address conditional linking improvements in separate PRs rather than expanding the scope of focused migration PRs like the LibArchive task-based installation migration.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/scripts/start_clp.py:613-613
Timestamp: 2025-08-08T06:59:42.436Z
Learning: In components/clp-package-utils/clp_package_utils/scripts/start_clp.py, generic_start_scheduler sets CLP_LOGGING_LEVEL using clp_config.query_scheduler.logging_level for both schedulers; compression scheduler should use its own logging level. Tracking via an issue created from PR #1152 discussion.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/clp-usage.md
📚 Learning: 2025-09-28T15:00:22.170Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 1340
File: components/job-orchestration/job_orchestration/executor/compress/compression_task.py:528-528
Timestamp: 2025-09-28T15:00:22.170Z
Learning: In components/job-orchestration/job_orchestration/executor/compress/compression_task.py, there is a suggestion to refactor from passing logger as a parameter through multiple functions to creating a ClpCompressor class that takes the logger as a class member, with current helper functions becoming private member functions.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/clp-usage.md
📚 Learning: 2025-06-18T20:39:05.899Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.

Applied to files:

  • docs/src/user-docs/guides-using-object-storage/clp-usage.md
🪛 LanguageTool
docs/src/user-docs/guides-using-object-storage/clp-usage.md

[uncategorized] ~91-~91: Possible missing comma found.
Context: ...is the prefix of all logs you wish to compress and must begin with the <all-logs-...

(AI_HYDRA_LEO_MISSING_COMMA)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
docs/src/user-docs/guides-using-object-storage/clp-usage.md (4)

8-14: Mode selection section is clear and well-structured.

The introduction effectively describes the two modes with embedded links to the corresponding detailed sections. The consistent formatting and use of bold labels for mode names enhances readability and makes it easy for users to navigate.


16-67: s3-object compression mode section is comprehensive and well-documented.

The section clearly explains the purpose of s3-object mode, provides command examples for both direct URL and --inputs-from file input, documents URL formats and field definitions, and includes appropriate warnings about duplicate object keys and the common-prefix requirement. The cross-reference to the clp-json quick-start guide for additional parameters (e.g., --timestamp-key, --dataset) avoids redundancy while providing users with necessary context.


69-97: s3-key-prefix compression mode section is well-documented and consistent.

The section mirrors the structure of the s3-object mode, clearly explains the mode's purpose, provides the command example, documents URL formats and field definitions, and includes a note about the single-argument limitation. The documentation is internally consistent and provides users with all necessary information to use this mode.


99-101: Link references are well-organized and appear correct.

The link reference definitions are in alphabetical order and support the documentation anchors throughout the file. The aws-region-codes reference provides appropriate context for understanding AWS region codes in the S3 context.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69fbe45 and 1e7fc13.

📒 Files selected for processing (1)
  • docs/src/user-docs/guides-using-object-storage/clp-usage.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/src/user-docs/guides-using-object-storage/clp-usage.md

[uncategorized] ~10-~10: Loose punctuation mark.
Context: ... two modes of operation: * s3-object : Compress S3 objects specified by their ...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~11-~11: Loose punctuation mark.
Context: ...y their full S3 URLs. * s3-key-prefix : Compress all S3 objects under a given S...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~80-~80: Possible missing comma found.
Context: ...is the prefix of all logs you wish to compress and must begin with the <all-logs-...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/guides-using-object-storage/clp-usage.md

88-88: Link and image reference definitions should be needed
Unused link or image reference definition: "add-iam-policy"

(MD053, link-image-reference-definitions)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (1)
docs/src/user-docs/guides-using-object-storage/clp-usage.md (1)

52-56: Verify documentation clarity for dual-mode constraints.

The notes correctly document current limitations (common prefix requirement for s3-object, single URL constraint for s3-key-prefix) and indicate these may be relaxed in future releases. This is clear, though you may want to consider highlighting the common prefix validation more prominently in the s3-object section since it could cause job failures.

Confirm whether the common prefix validation requirement is prominently enough communicated to users, especially given that objects can be specified via --inputs-from file (which could make the validation requirement less obvious).

Also applies to: 83-86

* `<bucket-name>` is the name of the S3 bucket containing your logs.
* `<region-code>` is the AWS region [code][aws-region-codes] for the S3 bucket containing your
logs.
* `<key-prefix>` is the prefix of all logs you wish to compress and must begin with the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add clarifying comma for readability.

A comma after "compress" improves clarity of the compound modifier.

Apply this diff to improve readability:

-  * `<key-prefix>` is the prefix of all logs you wish to compress and must begin with the
+  * `<key-prefix>` is the prefix of all logs you wish to compress, and must begin with the
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* `<key-prefix>` is the prefix of all logs you wish to compress and must begin with the
* `<key-prefix>` is the prefix of all logs you wish to compress, and must begin with the
🧰 Tools
🪛 LanguageTool

[uncategorized] ~80-~80: Possible missing comma found.
Context: ...is the prefix of all logs you wish to compress and must begin with the <all-logs-...

(AI_HYDRA_LEO_MISSING_COMMA)

🤖 Prompt for AI Agents
In docs/src/user-docs/guides-using-object-storage/clp-usage.md around line 80,
the sentence "* `<key-prefix>` is the prefix of all logs you wish to compress
and must begin with the" is missing a clarifying comma after "compress"; update
the line to insert a comma after "compress" so the compound modifier reads "logs
you wish to compress, and must begin with the" to improve readability.

Comment on lines 10 to 11
* `s3-object` : Compress S3 objects specified by their full S3 URLs.
* `s3-key-prefix` : Compress all S3 objects under a given S3 key prefix.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The space is on purpose to make it look nicer:
image

Comment on lines 53 to 55
`s3-object` mode requires the input object keys to share a non-empty common prefix. If the input
object keys do not share a common prefix, they will be rejected and no compression job will be
created. This limitation will be relaxed in a future release.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we should mention <all-logs-prefix> as we did when discussing key-prefix compression.

Copy link
Collaborator

@quinntaylormitchell quinntaylormitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @LinZhihao-723, looking good, a few comments to address.

Copy link
Collaborator

@quinntaylormitchell quinntaylormitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more nit comments to tighten it up, but otherwise lgtm!

Copy link
Collaborator

@quinntaylormitchell quinntaylormitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Comment on lines 18 to 19
`s3-object` mode allows you to specify individual S3 objects to compress by their full URLs. To
use this mode, call the `sbin/compress-from-s3.sh` script as follows, and replace the fields in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`s3-object` mode allows you to specify individual S3 objects to compress by their full URLs. To
use this mode, call the `sbin/compress-from-s3.sh` script as follows, and replace the fields in
The `s3-object` mode allows you to specify individual S3 objects to compress by using their full
URLs. To use this mode, call the `sbin/compress-from-s3.sh` script as follows, replacing fields in

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, the change from "and replace the" to "replacing" weakens the tone of the sentence, because "replacing" sort of dangles there (even though it is technically agreeing with the subject). but neither one is wrong, so I guess it is up to personal taste.

* `https://<bucket-name>.s3.<region-code>.amazonaws.com/<prefix>`
* `https://s3.<region-code>.amazonaws.com/<bucket-name>/<prefix>`
* The fields in `<url>` are as follows:
* `<object-url>` is a URL identifying the S3 object to compress. It can be written in either of two
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `<object-url>` is a URL identifying the S3 object to compress. It can be written in either of two
* `<object-url>` is a URL identifying the S3 object to compress. It can be written in one of two

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested changing to "either" because it more readily indicates that the user can write the URL in either one of the two formats. having just "one of two" implies that only one of the formats will be valid in each use case, and that the user has to figure out which one to use.

Comment on lines 64 to 66
`s3-key-prefix` mode allows you to compress all objects under a given S3 key prefix. To use this
mode, call the `sbin/compress-from-s3.sh` script as follows, and replace the fields in angle
brackets (`<>`) with the appropriate values:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`s3-key-prefix` mode allows you to compress all objects under a given S3 key prefix. To use this
mode, call the `sbin/compress-from-s3.sh` script as follows, and replace the fields in angle
brackets (`<>`) with the appropriate values:
The `s3-key-prefix` mode allows you to compress all objects under a given S3 key prefix. To use this
mode, call the `sbin/compress-from-s3.sh` script as follows, replacing fields in angle brackets
(`<>`) with the appropriate values:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above re. "replacing"

Comment on lines +76 to +77
* `<key-prefix-url>` is a URL identifying the S3 key prefix to compress. It can be written in either
of two formats:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `<key-prefix-url>` is a URL identifying the S3 key prefix to compress. It can be written in either
of two formats:
* `<key-prefix-url>` is a URL identifying the S3 key prefix to compress. It can be written in one of
two formats:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above re. "either"

@LinZhihao-723 LinZhihao-723 merged commit 4281c1a into y-scope:main Nov 1, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants